RR-92-20-ONR 


AD-A248  327 


iia 


HOW  TO  EQUATE  TESTS  WITH 
LITTLE  OR  NO  DATA 


Robert  J.  Mislevy 
Kathleen  M.  Sheehan 
Marilyn  Wingersky 


This  research  was  sponsored  in  part  by  the 
Cognitive  Science  Program 
Cognitive  and  Neural  Sciences  Division 
Office  of  Naval  Research,  under 
Contract  No.  N00014-88-K-0304 
R&T  4421552 


Robert  J.  Mislevy,  Principal  Investigator 

Educational  Testing  Service 
Princeton,  New  Jersey 


Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government. 


Approved  for  public  release;  distribution  unlimited. 


9  2  4  Ob  10  2 


92-08857 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  Ho  0 704-0  r 88 


'  oe-  'evoo'se  tre  *C'  rpvf*  r:  ,rr*i  vea1:"  e*  s:  3a‘a  vov^c— , 

U'ert-C"  i',^cr,Tiat-on  S^natomwe'm  r^carai^  f  s  t>«,fden  estimate  of  an,  ot^f'  assert  c*  t*-4 
""r i*a3G„ar®'s  S»'.icev  Directorate  for  m*c*"“a!'C"  Oo^-atior's  ana  B?DC'Ti.  »2’b  jet*ervon 
smr~o~r  ar>a  Bjo-®'  paDe*^or«  PeOuv'T-cn  p.'C‘€*^  7C*J-0 ’95/  Aar  rc<SSC3 


V  AGENCY  USE  ONLY  (Leave  piarik)  2.  REPORT  DATE 

February,  1992 


i  title  and  subtitle 


3.  REPORT  type  and  dates  covered 

Final 


How  to  Equate  Tests  with  Little  or  No  Data 


6  author^) 

Robert  J.  Mislevy,  Kathleen  M.  Sheehan  & 
Marilyn  Wingersky 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Educational  Testing  Service 
Rosedale  Road 
Princeton,  NJ  085^1 


9.  SPONSORING  MONITOR, NG  AGENCY  NAM.E(S)  AND  ADDRESS(ES) 

Cognitive  Sciences 
Code  11U2CS 

Office  of  Naval  Research 
Arlington,  VA  22217-5000 


-L  .  i  STATEMENT 


S.  FUNDING  NUMBERS 

G.  N0001L-88-K-030L 
PE.  61153N 
PR.  RR  0l20^ 

TA.  RR  0U20h-01 
WU.  R&T  hl+21552 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSORING  MONITORING 
AGENCY  REPORT  NUMBER 


12b  DISTRIBUTION  CODE 


Unclassified/Unlimited 


~  .  ..  v.  Orpsi 


Standard  procedures  for  equating  tests,  including  those  based  on  item  response 
theory  (IRT),  require  item  responses  from  large  numbers  of  examinees.  Such  data 
may  not  be  forthcoming  for  reasons  theoretical,  political,  or  practical. 
Information  about  items'  operating  characteristics  may  be  available  from  other 
sources,  however,  such  as  content  and  format  specifications,  expert  opinion,  or 
psychological  theories  about  the  skills  and  strategies  required  to  solve  them. 

This  paper  shows  how,  in  the  IRT  framework,  collateral  information  about  items  can 
be  exploited  to  augment  or  even  replace  examinee  responses  when  linking  or 
equating  new  tests  to  established  scales.  The  procedures  are  illustrated  with  data 
from  the  Pre-Professional  Skills  Test  (PPST). 


SUE..ECT  TERMS 

Bayesian  estimation,  cognitive  processes,  collateral 
information,  equating,  item  response  theory 


is  number  of  pages 
L6  +  RDP 


16.  PRICE  CODE 

N/A 


17  S:  TY  CLASSIFICATION  IE  SECURITY  CLASSIFICATION  19  SECURITY  CLASSIFICATION  20.  LIMITATION  OF  ABSTRACT 

0=  REPORT  OF  THIS  PAGE  OF  ABSTRACT 


Unclassified 


Unclassified 


Unclassified 


How  to  Equate  Tests  with 
Little  or  No  Data 


Robert  J.  Mislevy.  Kathleen  M.  Sheehan,  and  Marilyn  Wingersky 
Educational  Testing  Service 

February,  1992 


This  work  was  supported  in  part  by  Contract  No.  N00014-88-K-0304,  R&T  4421552, 
from  the  Cognitive  Science  Program,  Cognitive  and  Nueral  Sciences  Division,  Office  of 
Naval  Research.  We  are  grateful  to  Neil  Dorans,  Charlie  Lewis.  Martha  Stocking,  and 
Michael  Zieky,  the  editor,  and  an  anonymous  referee  for  helpful  discussions  and 
comments.  The  analyses  of  the  Pre-Professional  Skills  Test  fPPSD  were  supported  by 
ETS  development  funds,  and  were  carried  out  in  collaboration  with  Louann  Benton,  Kalle 
Gerritz,  Robin  Huffman.  Nancy  Petersen,  Clyde  Reese,  Duanli  Yan,  and  June  Zack. 


How  to  Equate  Tests  with  Little  or  No  Data 


Abstract 

Standard  procedures  for  equating  tests,  including  those  based  on 
item  response  theory  (IRT),  require  item  responses  from  large  numbers  of 
examinees.  Such  data  may  not  be  forthcoming  for  reasons  theoretical, 
political,  or  practical.  Information  about  items'  operating  characteristics 
may  be  available  from  other  sources,  however,  such  as  content  and  format 
specifications,  expert  opinion,  or  psychological  theories  about  the  skills  and 
strategies  required  to  solve  them.  This  paper  shows  how,  in  the  IRT 
framework,  collateral  information  about  items  can  be  exploited  to  augment 
or  even  replace  examinee  responses  when  linking  or  equating  new  tests  to 
established  scales.  The  procedures  are  illustrated  with  data  from  the  Pre- 
Professional  Skills  Test  (PPST). 

Key  words:  Bayesian  estimation,  cognitive  processes,  collateral 
information,  equating,  item  response  theory 


Acoosio Hv 

nt;s  c?,v<i 

D  V  T.v-j 
Ur.  ,.vd 


By 


How  to  Equate  Tests  with  Little  or  No  Data 


Selection  and  placement  testing  programs  update  their  tests  periodically,  as  the 
specific  content  of  the  items  becomes  obsolete  or  familiar  to  prospective  examinees. 
Because  the  new  test  forms  may  differ  in  difficulty  or  accuracy  even  if  they  tap  the  same 
underlying  skills  as  the  old  forms,  some  kind  of  “equating”  or  “linking”  is  required  to 
compare  results  across  forms  (Angoff,  1984).  Standard  procedures,  including  those  based 
on  item  response  theory  (1RT),  require  examinee  responses  to  both  new  items  and  items 
already  linked  to  an  established  scale.1  One  can  determine  levels  of  comparable 
performance  on  new  and  old  test  forms  to  any  desired  degree  of  accuracy  by  increasing  the 
number  of  examinees  in  the  linking  sample. 

Two  disparate  developments  in  educational  measurement  can  prevent  gathering  the 
data  that  standard  equating  procedures  require.  First,  current  legislative  activity  in  New 
York  is  intended  to  limit  the  administration  of  nonoperational  items  in  that  state,  including 
those  used  in  pretesting  and  equating.  Second,  the  growing  interest  in  modeling  the 
cognitive  processes  of  solving  test  items  (Embretson,  1985)  and  the  capability  of 
microcomputers  to  construct  tasks  around  cognitively  salient  features  (Bejar,  1985;  Irvine, 
Dann,  &  Anderson,  in  press)  raise  the  possibility  of  custom-building  test  items  for  each 
examinee  on  the  spot. 

Although  operational  equating  procedures  rely  solely  upon  examinee  responses, 
researchers  have  been  aware  for  some  time  of  alternative  sources  of  information  about  the 
operating  characteristics  of  test  items.  Lorge  and  Kruglov  (1952, 1953),  fir  example, 
investigated  the  degree  to  which  expert  and  novice  judges  could  predict  the  difficulties  of 
arithmetic  test  items,  and  Guttman  (1959)  predicted  partial  orderings  and  relationships 

^  If  Test  A  is  administered  to  Group  A  and  Test  B  to  Group  B,  the  tests  can  be  equated  if 
either  (1)  tests  A  and  B  contain  common  items,  (2)  Groups  A  and  B  overlap,  or  (3)  Groups 
A  and  B  are  representative  samples  from  the  same  population  of  examinees  (Lord,  1982). 


Equating  with  Little  or  No  Data 

Page  2 

among  inter-item  correlations  between  racial-attitude  items  constructed  according  to  a  facet 
design.  More  recent  studies  with  a  psychometric  orientation  have  examined  the  degree  to 
which  IRT  parameters  can  be  predicted  from  educationally-relevant  features  of  items  (e.g., 
Fischer,  1973;  Tatsuoka,  1987),  and  others  with  a  psychological  perspective  have  focused 
on  task  attributes  that  are  important  in  cognitive  processing  models  (e.g.,  Whitely,  1976). 
The  moderate  tc  high  relationships  between  item  features  and  operating  characteristics  arc 
of  considerable  theoretical  importance,  as  a  framework  for  assessing  test  validity  and  for 
constructing  tests  around  principles  of  learning  and  knowing. 

But  moderate  to  high  relationships  between  item  features  and  operating 
characteristics  are  the  information  equivalent  of  small  to  moderate  examinee  samples 
(Mislevy,  1988) — too  little  for  standard  large-sample  equating  procedures  to  work 
properly.  And  when  it  comes  to  test  equating,  collateral  information  differs  from  response- 
data  information  in  a  crucial  respect  Linking  information  from  examinee  responses  can  be 
made  arbitrarily  accurate  by  increasing  the  sample  size,  but  information  from  collateral  data 
is  limited  by  the  strength  of  its  relationship  to  item  operating  characteristics.  Procedures 
have  not  been  available  to  provide  coherent  inferences  about  item  operating  characteristics, 
and  the  equating  and  linking  functions  they  imply,  from  data  that  contain  substantially  less 
information  than  large  samples  of  responses. 

The  present  paper  attacks  this  problem  for  domains  in  which  (i)  an  IRT  model  fits 
reasonably  well,  (ii)  available  collateral  information  about  test  items  is  correlated  with  their 
IRT  parameters,  and  (iii)  a  start-up  data  set  is  available  from  which  to  build  predictive 
distributions  for  item  parameters,  given  this  collateral  information.  The  key  idea  is  the 
treatment  of  the  uncertainty  associated  with  the  parameters  of  the  new  items.  The  following 
section  reviews  IRT  test  equating  and  linking  with  known  item  parameters.  Sources  of 
collateral  information,  and  ways  to  bring  it  into  the  IRT  framework,  are  then  discussed 
An  example  from  the  Pre-Professional  Skills  Test  (PPST)  is  introduced  Linking  and 


Equating  with  Little  or  No  Data 

Page  3 

equating  procedures  are  then  extended  to  die  case  of  imperfect  knowledge  about  item 
parameters,  and  illustrated  with  the  PPST  data. 

IRT  Linking  and  Equating 

An  item  response  theory  (IRT)  model  gives  the  probability  that  an  examinee  will 
make  a  particular  response  to  a  particular  test  item  as  a  function  of  unobservable  parameters 
for  that  examinee  and  that  item  (Hambleton,  1989).  This  paper  addresses  scalar  parametric 
models  for  dichotomous  test  items,  but  the  ideas  apply  more  generally.  Define  Fj(0),  the 
item  response  function  for  Item  j,  as  follows: 

Fj(e)=  p{xj=iie,pj) .  (1) 

where  Xj  is  the  response  to  Item  j,  1  for  right  and  0  for  wrong;  0  is  the  examinee  ability 
parameter,  and  (3j  is  the  (possibly  vector-valued)  parameter  for  Item  j.  Our  example  uses 
the  3-parameter  logistic  IRT  model: 

Fj(6)*Cj  +  (l-Cj)'r[aj(e-bj)]; 

here  is  the  logistic  distribution  function,  or  vF(t)  =  (l+exp(-t))'1,  and  fip(aj,bj,Cj) 
conveys  the  sensitivity  of  Item  j,  its  difficulty,  and  the  tendency  of  examinees  with  very 
low  values  of  0  to  answer  it  correctly.  Under  the  usual  IRT  assumption  of  local  or 
conditional  independence,  the  probability  of  a  vector  of  responses  x=(xi,...pcn)  to  n  items 
is  the  product  over  items  of  terms  based  on  (1): 

p(xl0,B)  =  nFj(0)Xj[l-Fj(0)]1'Xj, 

(2) 

where  B=((Ji,...,{in). 

IRT  Linking  and  Equating  when  Item  Parameters  are  Known 

If  item  parameters  were  known,  one  way  to  compare  performances  on  different 
tests  would  be  to  make  inferences  on  the  0  scale,  using  an  estimator  such  as  the  maximum 


Equating  with  Little  or  No  Data 

Page  4 

likelihood  estimate  or  one  of  the  Bayesian  estimates  described  below.  The  varying  degrees 
of  difficulty  and  accuracy  among  test  forms  are  accounted  for  by  the  different  parameters  of 
the  items  that  comprise  them.  Equation  (2)  is  interpreted  as  a  likelihood  function  for  6, 
L(0lx,B),  once  x  has  been  observed.  The  value  of  6  that  maximizes  L  is  the  maximum 
likelihood  estimate  (MLE)  0.  Its  variance,  Var(0l0,B),  can  be  approximated  by  the  second 
derivative  of  log  L  evaluated  at  0.  The  posterior  density  of  0  with  respect  to  the  prior 
density  p(0)  is  obtained  as 

p(0lx,B) «  L(0Ix,B)  p(0) .  (3) 

The  mean  of  (3)  is  the  Bayes  mean  estimate  0;  the  variance,  Var(0lx,B),  indicates  the 
remaining  uncertainty.  The  mode  of  (3)  is  the  Bayes  modal  estimate  0. 

Alternatively,  the  IRT  model  can  be  used  to  generate  an  equating  function  between 
number-right  or  percent-correct  scores  cm  two  tests,  through  “IRT  true- score  test  equating” 
(Dorans,  1990;  Lord,  1980).  The  expected  number-right  score  on  Test  A  for  an  examinee 
with  proficiency  0  is  given  by 

xA(6)=X  P(xj=ll0,pj)  =  X  fj(9)  ’ 

jeSA  jeS*  (4) 

where  SA  is  the  set  of  indices  of  items  that  appear  in  Test  A.  The  expected  score  on  Test 

B,  tb(0),  is  defined  analogously.  Scores  on  two  tests  are  “true-scorc  equated”  if  they  are 

expected  values  of  the  same  value  of  0,  and  the  IRT  true-score  equating  line  is  the  plot  of 

all  pairs  of  equated  Test  A  and  Test  B  true  scores:  {(xA(0),tB(0))}  for  06  (-*»,-k»).2 

Note  that  the  averaging  that  occurs  in  (4)  is  for  fixed  0,  over  the  uncertainty  associated  with 

the  observational  setting.  Specifically,  the  uncertainty  in  scores  for  a  given  0  in  standard 

IRT  true-score  equating  is  the  0  or  1  for  each  xj,  with  |$j  assumed  known. 

2  Under  the  3PL,  this  relationship  does  not  give  equatings  for  scores  below  the  sum  of  the 
cjs  on  a  given  test  The  practical  solution  is  generally  to  extend  the  relationship  from  the 
lowest  point  on  the  true-score  equating  curve  linearly  down  to  (0,0). 


Equating  with  Little  or  No  Data 

Page5 


Item  Parameter  Estimation 

But  item  parameters  are  never  known  with  certainty,  they  must  be  estimated  from 
observable  data  of  one  kind  or  another — in  practice,  almost  always  from  samples  of 
examinee  responses.  Bayesian  inference  about  B  (e.g.,  Mislevy,  1986;  Tsutakawa  &  Lin, 
1986)  begins  with  a  (possibly  uninformative)  prior  distribution  p(B),  a  known  or 
concurrently  estimated  examinee  population  density  p(6),  and  a  response  matrix 
X=(xi,...,xn)  from  a  sample  of  N  independently-responding  examinees.3  The  posterior 
distribution  of  B  is 

p(BIX)«p(B)L(BIX)t  (5) 

where  L(BIX)  is  the  marginal  likelihood  function  for  the  item  parameters  (Bock  &  Aitkin, 
1981): 

L(BIX)  =  n  I  p(xiiei,B)p(0i)d0i. 

i=i  )  (6) 

One  can  obtain  Bayes  mean  estimates  B  or  Bayes  modal  estimates  B,  and  a  posterior 
variance  matrix  Ib  from  (5),  leading  to  the  approximations  p(BIX)  -  N(B,Eb)  or 
N(B^b)-  Alternatively,  one  obtains  the  MLE  B  by  maximizing  (6)  with  respect  to  B. 

The  consistency  of  B,  B,  and  B  as  estimators  of  B  justifies  using  item  parameter  estimates 
from  large  samples  of  examinees  as  if  they  were  known  true  values  in  IRT  linking  and 
scaling;  e.g.,  using  L(0!x,B=B)  for  L(0lx,B)  when  estimating  0,  orp(xj=H0,B=B)  for 
p(xj=l  I0,B)  when  calculating  xA(0)  and  tB(0)  in  equating  (Lord,  1982). 

If  B  is  not  well  determined — i.e.,  p(BI“data  relevant  to  B”)  is  too  spread  out  to  be 
approximated  by  a  single-point  density — this  approximation  understates  the  uncertainty 
associated  with  subsequent  inferences,  and,  as  we  shall  see,  can  yield  biased  estimates. 


3  Independent  priors  are  typically  posited  for  B  and  0.  Independent  and  identical  priors 
are  also  posited  for  examinees  in  this  presentation,  but  see  Mislevy  and  Sheehan  (1989a) 
on  the  role  of  collateral  information  about  examinees  in  item  parameter  estimation. 


Equating  with  Little  or  No  Data 

Page  6 

“Data  relevant  to  B”  can  be  examinee  responses  (X),  collateral  information  about  the  items 


(Y),  or  both.  B  is  poorly  determined  when  the  examinee  sample  is  small,  or  when  only 
collateral  information  about  the  items  is  available.  The  preceding  paragraphs  addressed 
p(BIX);  the  following  section  addresses  p(BIY)  and  p(BIX,Y).  We  then  return  t  >  methods 
for  dealing  with  uncertainty  about  B  in  linking  and  equating. 

Collateral  Information  about  Items 

This  section  discusses  potential  sources  of  collateral  information  (yj)  about  a  test 
item,  and  suggests  ways  to  express  this  information  in  terms  of  distributions  for  the  item 
parameters  (3j.  We  assume  the  existence  of  a  start-up  data  set  in  which  both  collateral 
information  and  item  parameter  estimates  are  available  from  a  collection  of  items.  The  basic 
steps  are  as  follows: 

1 .  Identify  features  of  items  that  are  useful  in  predicting  item  operating  characteristics. 

2 .  Characterize,  analytically  or  empirically,  distributions  p(Piyj)  based  on  data  from 
the  previously  administered  items. 

3 .  Employ  the  distributions  obtained  in  Step  2  as  prior  distributions  for  the  (is  of  new 
items,  conditional  on  their  collateral  data. 

Sources  of  Collateral  Information 

Expen  Judgment  Irving  Lorge  and  his  students  studied  the  degree  to  which 
experts’  predictions  of  item  difficulty  could  be  used  to  construct  parallel  test  forms  (Lorge 
&  Kruglov,  1952, 1953;  Tinkelman,  1947).  Raters  turned  out  to  be  good  at  predicting 
the  relative  difficulties  of  items,  but  not  absolute  levels  of  difficulty.  Thorndike  (1982) 
found  that  pooled  judgements  from  20  trained  raters  accounted  for  between  55-  and  71- 
percent  of  the  variance  in  item  difficulties  in  three  aptitude  tests — too  low,  he  concluded 
with  disappointment,  to  substitute  for  pretesting,  say,  a  thousand  examinees.  In  Chalifour 
and  Powers’  (1989)  study  of  analytical  reasoning  items  in  the  Graduate  Record 
Examination  (GRE),  an  experienced  item  writer’s  predictions  accounted  for  72-percent  of 


Equating  with  Little  or  No  Data 

Page  7 

normalized  item  difficulty  variance.  Bejar  (1983)  found  item  writers'  predictions  accounted 


for  only  about  20-percent  of  the  variation  among  difficulties  and  among  item-test 
correlations  in  an  English  Usage  test,  and  less  still  in  a  Sentence  Correction  test  In  a 
subsequent  study  of  analogy  items,  test  developers’  predictions  accounted  for  43-percent  of 
the  variance  among  item  difficulties  (Enright  &  Bejar,  1989). 

Test  Specifications.  Educational  tests  are  written  to  tap  skills  and  knowledge  in  a 
domain  of  content  Osbum  (1968)  and  Hively,  Patterson,  and  Page  (1968)  suggested 
building  “item  forms,’’  or  templates  to  create  items,  around  the  important  features  of  a 
content  domain.  Researchers  have  developed  numerous  taxonomies  to  elucidate  the  content 
domains  that  tests  address  (e.g.,  Mayer,  1981;  Chaffin  &  Peirce,  1988).  Test 
specifications  can  also  address  item  formats  or  modalities.  Because  they  are  integral  to  the 
test  development  process,  content  and  format  specifications  constitute  a  readily  available 
source  of  collateral  information  about  items.  Whitely  (1976)  accounted  for  31-percent  of 
the  variance  among  percents-correct  of  verbal  analogy  items  with  a  taxonomy  of  types  of 
relationships.  Drum,  Calfee,  and  Cook  (1981)  accounted  for  between  55-  and  94-percent 
of  the  variance  in  percents-correct  in  18  reading  tests  with  “surface  features”  such  as 
proportion  of  content  words  in  stems,  length  of  distractors,  word  frequencies,  and 
syntactic  structures.  Chalifour  and  Powers  (1989)  accounted  for  62-percent  of  percents- 
correct  variation  and  46-percent  of  item  biserial  correlation  variation  among  GRE  analytical 
reasoning  items  with  seven  predictors,  including  the  number  of  rules  presented  in  a  puzzle 
and  the  number  of  rules  actually  required  to  solve  it 

Cognitive  Processing  Requirements.  From  the  psychologist’s  point  of  view,  the 
salient  features  of  an  item  concern  the  operations,  strategy  requirements,  or  working 
memory  load  of  anticipated  attempts  to  solve  it  Scheuneman,  Gerritz,  and  Embretson 
(1989)  accounted  for  about  65-percent  of  the  variance  in  item  difficulties  in  the  GRE 
Psychology  Achievement  Test  and  the  Reading  section  of  the  National  Teacher 
Examination  with  variables  built  around  readability,  semantic  content,  cognitive  demand. 


Equating  with  Little  or  No  Data 

Page  8 

and  knowledge  demand.  Mitchell  (1983)  derived  collateral  information  variables  from 
theories  of  cognitive  processes  for  the  Word  Knowledge  (WK)  and  Paragraph 
Comprehension  (PC)  tests  of  the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB), 
and  used  than  to  predict  Rasch  item  difficulty  parameters.  The  proportions  of  item 
difficulty  variance  accounted  for  in  three  ASVAB  forms  ranged  from  17-  and  30-percent 
for  WK,  and  from  66-  to  90-percent  for  PC. 

Characterizing  Item  Parameter  Distributions 

Procedures  for  incorporating  collateral  information  yj  about  test  items  ir  '  IRT 
include  Scheiblechner  (1972)  and  Fischer’s  (1973)  Linear  Logistic  Test  Model  (LLTM)  and 
Mislevy’s  (1988)  extension  of  it  The  LLTM  is  a  1-parameter  logistic  (Rasch)  IRT  model 
in  which  item  difficulty  parameters  are  linear  functions  of  effects  for  key  features  of  items: 

K 

ft  =  X  ykjnk . 

k=l 

where  Pj  is  the  difficulty  parameter  of  Item  j;  %  is  the  contribution  of  Feature  k  to  item 
difficulty,  for  k=l,...  JC  salient  item  features;  and  ykj,  a  known  collateral  information 
variable,  signifies  the  extent  to  which  Feature  k  is  represented  in  Item  j.  In  Fischer's 
(1973)  calculus  example,  the  collateral  information  about  Item  j  was  a  vector  of  indicator 
variables  ykj,  for  k=l,...,7,  denoting  whether  or  not  each  of  seven  differentiation  rules  was 
required  in  its  solution. 

Fischer  and  Formann  (1982)  list  many  applications  of  the  LLTM  in  which 
meaningful  item  features  account  for  substantial  proportions  of  item-difficulty  variance,  but 
they  note  that  the  original  goal  of  explaining  all  the  variation  among  item  difficulties  is 
never  met  in  realistic  applications.  Mislevy  (1988)  extended  the  LLTM  to  allow  for 
variation  of  difficulties  among  items  with  the  same  salient  features,  by  incorporating 
residuals  around  the  LLTM  estimate  with  variance  ft.  If  the  prediction  model  is  built  using 


Equating  with  Little  or  No  Data 

Page  9 

a  large  number  of  previously-calibrated  test  items,  a  predictive  distribution  for  the  difficulty 
parameter  of  a  new  item  might  thus  be  approximated  as 


where  yj=(yij,...,yKj)-  The  mean  of  the  predictive  distribution,  Pj  =  X  ykj'Hk.  is 
essentially  the  LLTM  point  estimate  for  Pj.  Note  that  information  about  new  items  from 
collateral  data  can  be  combined  with  examinee  responses  to  the  same  items  via  (5),  as  an 
informative  prior  distribution,  to  yield  p(BIX,Y). 

An  Example  from  the  PPST  (Part  1) 

The  Pre-Professional  Skills  Test  (PPST)  is  used  to  measure  the  reading, 
mathematics,  and  writing  skills  of  prospective  teachers  during  their  college  years.  Our 
example  concerns  the  reading  tests  from  eight  test  forms  administered  between  1985  and 
1990.  Each  form  comprised  forty  items,  although  one  or  two  items  were  excluded  from 
each  form  due  to  problems  with  the  item  or  the  scoring  key.  In  accordance  with  the  item 
overlap  design  used  in  the  PPST,  nearly  all  of  the  items  on  the  first  form  appeared  in  one  or 
more  later  forms;  the  last  two  forms  each  had  twenty  unique  items.  A  “baseline”  calibration 
of  the  144  unique  items  was  carried  out  under  the  3PL  with  a  sample  of  approximately 
5000  examinees  per  form,  using  Mislevy  and  Bock's  (1983)  BILOG  program.  A  second 
“operational”  calibration  was  carried  out  with  a  sample  of  only  500  examinees  each  for  the 
first  seven  forms  only,  using  only  the  103  items  that  did  not  appear  on  the  eighth  form. 

This  example  employs  a  collateral  information  model  built  on  the  seven-form  operational 
data  to  link  the  eighth  left-out  form  to  the  operational  scale.  The  results  obtained  with  the 
baseline  calibration  are  the  standard  of  evaluation.  Part  1  summarizes  the  building  of  the 
collateral  information  model,  and  demonstrates  the  shortcomings  of  using  the  resulting 
point  estimates  of  item  parameters  as  if  they  were  known  true  values. 


Equating  with  Little  or  No  Data 

Page  10 

The  conditional  distributions  of  estimated  item  parameters  in  the  seven-form 
operational  calibration  were  approximated  with  a  multivariate  multiple  regression  model 
The  dependent  variable  was  the  item  parameter  vector  (slope,  intercept,  lower  asymptote), 
or  Pp(aj,  -(bj/aj),  Cj),  with  a  sample  size  of  100  items.  An  initial  set  of  30  collateral 
variables  consisted  of  codings  of  items’  content  and  cognitive  processing  features,  as 
proposed  by  a  team  of  test  developers  familiar  with  the  PPST.  Two  test  developers  rated 
all  items  from  all  eight  forms;  the  averages  of  their  ratings  were  employed  throughout  The 
collateral  variables  included  in  the  final  prediction  model  were  determined  from  separate 
step-dovT  regression  analyses  on  aj,  -(bj/aj),  and  Cj.  For  the  predictors  included  in  the 
final  model,  descriptive  summaries  of  the  variables,  proportions  of  rater  agreement,  and 
the  parameter  values  in  the  final  multivariate  regression  model  appear  in  Table  1. 

[Insert  Table  1  about  here] 

The  proportions  of  variance  accounted  for  by  the  prediction  model  were  .02,  .24, 
and  .05  for  the  slope,  intercepts,  and  asymptotes.  This  corresponds  to  multiple  R’s  of  .14, 
.49,  and  .22.  Figure  1  plots  a,  b,  and  c  predictions  for  the  39  Form  8  items  against  the 
baseline  values.  Considerable  variation  remains  for  individual  item  difficulty  (b) 
parameters,  and  the  predictions  for  a  and  c  parameters  differ  only  negligibly  from  their 
averages.  Figure  2  presents  the  test  characteristic  curves  (TCCs)  for  Form  8  as  constructed 
from  the  predictions  and  the  baseline  values.  The  TCCs  give  expected  scores  in  the 
percent-correct  metric  as  a  function  of  0.  Much  of  the  noise  apparent  in  Figure  1  has  been 
“cancelled  out”  in  Figure  2,  as  the  predicted  TCC  is  surprisingly  close  to  the  baseline  TCC. 
The  discrepency  is  systematic,  however.  Because  only  24-percent  of  the  variance  among 
item  difficulties  has  been  accounted  for,  estimates  of  the  item  difficulty  point  estimates  are 
too  close  to  their  mean.  Items  are  modeled  as  more  similar  than  they  really  are,  causing  the 
predicted  TCC  to  rise  too  sharply  in  this  region.  This  problem  affects  die  IRT  true-scorc 
equating.  Figure  3  shows  an  equating  curve  based  on  operational  estimates  for  Form  7  and 


Equating  with  Little  or  No  Data 

Page  11 

prediction-based  point  estimates  for  Form  8,  along  with  the  curve  obtained  using  baseline 
item  parameter  estimates  for  both  tests. 

[Insert  Figures  1-3  about  here] 

MLEs  for  0  and  standard  errors  were  calculated  for  a  randan  sample  of  250 
examinees  from  Form  8,  using  baseline  item  parameters  and  prediction-based  point 
estimates.  Figure  4  shows  the  0s.  A  bias  corresponding  to  the  discrepencies  in  the  TCCs 
is  apparent,  especially  at  the  higher  end  of  the  distribution.  The  scatter  of  the  prediction- 

A 

based  0s  around  their  baseline  counterparts  reflects  increased  uncertainty  due  to  incomplete 
information  about  item  parameters,  since  the  only  difference  between  the  two  sets  of 
estimates  is  the  item  parameters  used  to  calculate  them.  This  variance  is  about  .10.  Figure 
5  shows  the  relative  change  in  modelled  standard  errors,  or  square  roots  of  the  variance 

«*"y 

estimates  Var(0l0,B),  when  calculated  with  prediction-based  point  estimates  of  item 
parameters  in  place  of  B  as  opposed  to  baseline  values.  The  average  change,  about  zero4, 
is  misleading,  because  the  actual  standard  error  of  the  8  estimates  should  be  larger,  simply 
calculating  Var(0l0,B)  with  B  in  place  of  B  neglects  uncertainty  about  0s  due  to  the 
remaining  uncertainty  about  item  parameters.  We  shall  see  that  ignoring  this  uncertainty 
causes  posterior  variances  for  0s  to  be  underestimated  by  about  a  third  in  this  example. 

Up  to  this  point,  we  have  seen  that  collateral  variables  do  provide  potentially  useful 
information  about  item  parameters.  A  test  characteristic  curve  and  0s  calculated  with 
predicted  item  parameters,  or  PjS,  are  surprisingly  good,  given  that  multiple  Rs  for  slopes, 
intercepts,  and  lower  asymptotes  were  only  .14,  .49,  and  .10.  But  the  shortcomings  of 
these  “best  estimate”  point  predictions  for  item  parameters  are  serious  enough  to  prevent  us 
from  simply  using  them  as  if  they  were  true  Pj  values.  Biases  in  0s  appear  because  the  PjS 
are  too  clustered  around  their  average.  More  seriously,  disregarding  the  uncertainty  about 
item  parameters  causes  substantial  understatement  of  the  uncertainty  about  0s.  In  this 


4  The  curvature  is  due  to  the  clustering  of  predicted  item  difficulties  around  their  average. 


Equating  with  Little  or  No  Data 

Page  12 

example,  a  variance  component  of  .10,  about  half  the  average  of  the  usual  error  variance 

^■s 

estimate  for  8s,  is  being  ignored. 

[Insert  Figures  4  &  5  about  here] 

IRT  Linking  and  Equating  when  Item  Parameters 
Are  Not  Known  with  Certainty 

Consider  inferences  about  0  with  imperfect  knowledge  about  B,  conveyed  through 

p(Bldata),  where  “data”  refers  to  a  calibration-sample  X  of  responses  from  N  examinees, 

collateral  information  about  items,  or  both.  The  probative  value  about  0  from  x  is  now 

expressed  through  what  is  sometimes  called  an  average  likelihood  function,  which  accounts 

for  uncertainty  about  B  by  averaging  over  its  distribution: 

L(0lx,data  concerning  B)  =  I  L(0ix,B)  p(Bldata  concerning  B)  dB  . 

J  (7) 

Tsutakawa  compared  Bayesian  inferences  about  0  using  p(BIX)  and  B=B,  under  the  2- 

and  3-parameter  logistic  models  (the  2PL  and  3FL).  Under  the  2PL,  the  more  accurate 

estimates  of  Var(8lx)  using  p(BIX)  were  higher  than  the  usual  approximation, 

Var(0lx,B=B),  by  an  average  of  4  percent  with  N=400,  and  up  to  30  percent  with  N=100 

(Tsutakawa  &  Soltys,  1988).  Under  the  3PL  with  N=400,  increases  ranged  from  50 

percent  to  over  1000  percent  in  unfavorable  cases  (Tsutakawa  &  Johnson,  1990). 

Similarly,  uncertainty  about  item  parameters  must  be  taken  into  account  in  IRT  true- 

score  equating.  For  a  fixed  value  of  0,  knowledge  about  the  observed  score  distribution 

must  take  into  account  uncertainty  about  item  parameters  as  well  as  uncertainty  about  item 

responses.  This  requires  integrating  over  p(Bldata)  in  (4)  to  obtain  expected  scores: 

Ta(0)seJta(0)]*  X  J  p(xj=U0,Pj)p(Pjldata)dpj. 

jeSA  J  (g) 

The  IRT  true-score  equating  line  now  matches  values  of  Ta(0)  and  Tb(0). 


Equating  with  Little  or  No  Data 

Page  13 


We  note  in  passing  that  this  extended  definition  of  IRT  true-score  equating  is 
consistent  with  a  familiar  practice  from  true-score  test  theory:  treating  total  scores  with  the 
same  value  as  equivalent  when  tests  are  random  samples  of  items  from  the  same  pool 
“True  score”  in  this  case  is  defined  as  expected  percent-correct  in  the  pool,  which  is 
naturally  the  expected  percent-correct  in  a  random  sample  of  items.  The  fact  that  some 
samples  of  items  will  be  harder  than  others  is  accounted  for  by  adding  a  be  tween-forms 
variance  component  to  statements  about  the  precision  of  student  scores  (Cronbach,  Gleser, 
Nan  da,  &  Rajaratnam,  1972).  This  component  can  be  reduced  if,  instead  of  simple 
random  sampling,  stratified  sampling  according  to  content  specifications  is  used  to  select 
items;  that  is,  prespecified  numbers  of  items  are  selected  from  “bins”  of  similar  items. 

Items  may  not  be  literally  drawn  from  an  existing  pool,  but  conceptually  sampled  through 
the  process  of  writing  tests  to  the  same  content  specifications.  This  presentation  extends 
the  idea  to  tests  constructed  with  possibly  different  numbers  of  items  from  different  bins. 

Numerical  procedures  to  carry  out  the  integration  required  in  (7)  and  (8)  include  the 
second-order  approximation  Tsutakawa  used  and  Rubin’s  (1987)  multiple  imputations,  a 
variant  of  Monte  Carlo  integration  (Mislevy  &  Yan,  in  press,  apply  this  technique  to 
uncertainty  about  item  parameters).  The  current  presentation  employs  Lewis’s  (1985) 
“expected  response  curve”  approach,  which  is  now  described  below. 

Expected  Response  Curves 

In  dichotomous  IRT  models,  the  expected  value  of  a  correct  response  to  Item  j 
given  0  and  B  is  Fj(0)=P(xj=ll0,Pj).  If  pj  is  only  partially  known,  through  p(Pjldata),  the 
probability  of  a  correct  response  conditional  on  0  but  marginal  with  respect  to  B  can  be 
written  as 

F;(0)  =  Ep,[Fj(0)]  =  |  P(xj=ll0,Pj)  p(Pjldata)df$j , 

an  “expected  response  curve”  that  gives  the  probability  of  correct  response  conditional  cm  0 
taking  into  account  uncertainty  about  pj  (Lewis,  1985). 


Equating  with  Little  or  No  Data 

Page  14 

Even  though  Fj*(0)  is  the  expected  value  of  a  correct  response  at  each  value  of  0,  it 
is  not  the  same  as  Fj(0)  evaluated  with  the  expected  value  of  flj.  The  shape  of  Fj*  depends 
on  the  shape  of  Fj  and  the  shape  of  p(J3j);  in  general,  Fj*  and  Fj  will  not  be  of  the  same 
functional  form.  A  simple  example  in  which  they  are  may  aid  intuition.  Suppose  that  Fj  is 
2-parameter  normal  (2PN)  with  slope  parameter  a j  and  difficulty  parameter  bj;  aj  is  known 
with  certainty;  and  p(bjldata)  is  N(bj,c^).  Then  Fj*  is  also  2PN,  but  with  bj*=bj  and 


=  (a-2+oJ) 


•1/2 


In  this  special  case,  the  location  parameter,  bj*,  has  the  same  value  as  the  Bayes  mean 
estimate  for  bj.  The  slope  parameter,  aj*.  is  attenuated  to  account  for  uncertainty  about  bj. 

Figures  6  and  7  illustrate  the  situation.  Figure  6  concerns  a  2PN  curve  whose  slope 
is  known  to  be  1  and  the  whose  location  is  known  only  up  to  p(b)  -  N(0,1).  The  shaded 
region  suggests  this  uncertainty  with  bands  drawn  at  one  and  two  standard  deviations 
around  the  curve  defined  by  b=b=0.  This  central  curve  thus  corresponds  to  the  best 
estimate  of  b  under  squared  error  loss.  Also  shown  is  F*,  which  is  also  a  2PN  response 
curve,  and  is  also  centered  at  0,  but  with  a=V.5=.7071.  The  attenuation  toward  a 
probability  of  .5  can  be  understood  from  Figure  7,  a  slice  of  the  posterior  distribution  for 
P(x=ll0,b)  at  0=1  as  b  ranges  from  -«>  to  +«>.  As  a  result  of  uncertainty  about  b,  the 
distribution  for  the  probability  of  a  correct  response  response  ranges  from  0  to  1.  Its  mean, 
which  is  required  in  (8),  is  lower  than  the  probability  associated  with  the  most  likely  value 
of  b  due  to  the  skew.  The  mean  is  shifted  toward  .5,  landing,  by  definition,  at  F*(l). 

[Insert  Figures  6  and  7  about  here] 

If  the  information  about  items  is  independent — that  is,  p(Bldata)=np(|3jldata) — then 
inferences  about  0  that  take  uncertainty  about  B  into  account  have  the  same  conditional 


independence  form  as  when  item  parameters  are  known; 

p(x!0,data  concerning  B)  =  ]~[  F*(0)*'  [1-Fj (0)]1Xj 

j=i 


(9) 


Equating  with  Little  or  No  Data 

Page  15 

After  x  is  observed,  (9)  can  be  interpreted  as  an  expected  likelihood  function  for  0,  say 
L(xl8,data  concerning  B),  or  L(xl8)  for  short.  The  posterior  p(0lx)  is  proportional  to 
L(xl0)  p(0),  and  posterior  means  and  variances  for  6  are  obtained  as  usual,  except  they 
take  uncertainty  about  B  into  account  by  using  Fj*s  rather  than  Fjs. 

Equation  (9)  proves  useful  even  if  p(B)  is  not  independent  over  items.  Although 
the  dependencies  among  items  are  ignored,  (9)  is  an  example  of  what  Arnold  and  Strauss 
(1988)  call  a  “pseudo-likelihood;”  under  mild  regularity  conditions  on  the  Fj*s,  its 
maximum  is  a  consistent  estimator  of  6.  Thus  for  large  n,  Bayesian  and  likelihood  point 
estimates  of  8  based  on  (9)  have  the  correct  expectation.  Indicators  of  their  uncertainty 
based  on  (9),  however,  such  as  the  variance  estimator  of  8  and  the  posterior  variance,  tend 
to  be  too  optimistic.  But  if  the  dependencies  among  item  parameter  estimates  arc  small — 
and  they  tend  toward  zero  as  test  length  increases  (Mislevy  &  Sheehan,  1989b) — the 
underestimation  of  uncertainty  about  0  from  this  source  is  minor. 

Expected  response  curves  can  also  be  used  for  IRT  true-score  equating,  with 

Xa(6)=  IF*(8). 

j  (10) 

Since  only  expectations  arc  involved,  (10)  is  correct  whether  or  not  p(B)  is  not 
independent  over  items. 

Gosed-form  solutions  for  F*  are  not  generally  available.  One  way  to  approximate 
Fj  is  outlined  below. 

1 .  Lay  out  a  grid  of  0  values  across  the  range  of  interest  Denote  by  0m  the  m*  grid  point 

2.  For  Item  j,  draw  a  sample  of  S  item  parameter  values  from  p(Pjldata).  Denote  by  f3j(s)  the 
s*  such  draw . 

3 .  Evaluate  the  probability  of  a  correct  response  to  Item  j  at  0m  using  each  ft(s)  in  turn,  or 
P(xj=ll8=0m,Pj={Jj(s)).  Denote  the  result  Pjm(s). 

4.  The  point  on  the  expected  response  curve  for  0=0m  is  approximated  by  the  average  of  the 
values  obtained  in  Step  3: 


Equating  with  Little  or  No  Data 

Page  16 

F;<em)  -  s->i  . 

1=1 

Steps  2  and  3  generate  an  empirical  approximation  of  the  predictive  distribution  of 
P(Xj=ll0,Pj)  over  the  range  of  Pj  for  fixed  values  of  0,  an  example  of  which  appeared  as 
Figure  7.  Step  4  is  finding  the  posterior  mean  for  P  with  respect  to  Pj  conditional  on  each 
of  the  0  points — approximations  of  the  values  on  the  expected  response  curve.  Subsequent 
inferences  about  0  can  be  drawn  using  these  values  directly  in  a  discrete  approximation  of 
integrals  involving  0  distribution,  or  after  fitting  a  smooth  curve  to  them. 

It  is  convenient  operationally  to  approximate  each  F*  with  the  closest  curve  from  a 
familiar  family — for  example,  the  closest  3PL  curve  in  applications  based  on  the  3PL 
model,  or  the  closest  2PL  model  in  applications  based  on  the  1PL  or  2PL.  This  approach 
makes  it  possible  to  use  standard  software  designed  for  popular  parametric  IRT  models  to 
estimate  examinee  scores,  construct  tests,  or  draw  equating  lines;  the  only  difference  is 
entering  item  parameters  for  expected  response  curves  rather  than  very  precise  estimates  of 
true  item  parameter  values.  Let  F**  denote  the  target  approximation.  Given  F\  a  weighted 
least  squares  estimate  of  F**  is  obtained  by  minimizing  the  fitting  function 

M 

X  [F*-(emiB")-F-(em)]2w(em) 

m=l 

with  respect  to  the  parameter  (J**  of  F**,  where  W(0nO  is  a  weighting  function  that 
specifies  the  relative  importance  of  matching  F**  to  F*  at  various  points  along  the  0  scale. 
In  practical  work,  one  might  create  simulated  examinees  at  each  0m-point  in  numbers  that 
reflea  the  relative  importance  of  fitting  F**  at  those  points  and  with  the  proportion  F* (0)  of 
them  with  correct  answers  in  each  group,  then  run  a  logit  regression  analysis  or  the 
LOGIST  computer  program  (Wingersky,  1983)  with  the  “fixed  0”  option  to  estimate  tire 
parameters  B**  of  a  best-fitting  2PL  or  3PL.  Additional  information  that  becomes  available 
over  time,  say,  as  examinee  responses  are  acquired  in  operational  testing,  can  be 
incorporated  merely  by  updating  item  parameter  values  under  the  same  model. 


Equating  with  Little  or  No  Data 

Page  17 


An  Example  from  the  PPST  (Part  2) 

Expected  response  curves  for  the  items  of  Form  8  were  constructed  from  the 
predictive  distributions  built  in  Part  1  of  the  example,  with  100  draws  of  (aj,-(b/aj),Cj)  for 
each  item.  Multivariate  normal  distributions  were  employed  for  each  item,  with  means 
given  by  the  multiple  regression  equations  and  the  covariance  matrix  shown  in  Table  1.  At 
each  point  in  a  8  grid  from  -3  to  +3  in  steps  of  .2,  the  average  modelled  percent-correct 
was  evaluated  from  each  of  the  100  plausible  values  of  pj.  The  average  of  these  values 
across  the  grid  constituted  a  discrete,  non  parametric  estimate  of  an  item’s  expected 
response  curve.  For  each  item,  the  parameters  of  best-fitting  3PL  curves  were  obtained 
using  the  method  outlined  in  the  proceeding  section. 

Figure  8  shows,  for  eight  representative  items,  nonparametric  expected  response 
curves  and  trace  lines  generated  from  baseline  item  parameters,  point  estimates  from 
collateral  information,  and  from  parameters  of  3PL  fits  to  expected  response  curves.  Three 
observations  can  be  made  from  these  tracelines,  and  similar  ones  for  the  rest  of  the  items: 

1 .  None  of  the  approximations  is  impressive  as  an  estimate  of  the  baseline  curve,  although 
again  it  is  their  performance  as  an  ensemble  that  counts. 

2 .  The  expected  response  curves  are  noticeably  shallower  than  the  trace  lines  based  on  point 
estimates.  The  uncertainty  about  the  item  parameters  engenders  this  “hedging  of  bets.” 

3 .  The  3PL  approximations  capture  the  nonparametric  approximations  quite  well.  From  this 
point,  we  therefore  refer  to  the  3PL  fits  as  expected  response  curves. 

It  is  essential  to  remember  that  “getting  good  item  parameter  estimates”  is  not  our  objective; 
rather,  it  is  to  express  what  we  know  about  item  parameters  in  a  way  that  gives  us  good 
subsequent  inferences  that  involve  the  unknown  item  parameter  values. 

[Insert  Figure  8  about  here] 

Figure  9  shows  the  test  characteristic  curves  corresponding  to  the  baseline  estimates 
and  the  expected  response  curves.  The  bias  in  the  TCC  in  Figure  2,  caused  by  the 


Equating  with  little  or  No  Data 

Page  18 


shrinkage  of  the  point  estimates  of  item  response  curves  to  their  means,  has  been  largely 
eliminated  Similar  improvements  are  made  in  reducing  bias  for  MLEs,  as  can  be  seen  by 
comparing  Figure  10  with  Figure  4.  Figure  1 1,  which  should  be  compared  with  Figure  3, 
shows  the  improvement  in  the  estimated  true -score  equating  line  between  Form  8  and  Form 
7.  Figure  12  shows  the  test  information  curves  (TICs)  corresponding  to  the  baseline  item 
parameter  estimates,  the  point  predictions  generated  in  Part  1  of  the  example,  and  the 
expected  response  curves.  The  reciprocals  of  the  values  on  these  curves  are  approximate 
squared  standard  errors  for  MLEs  of  0s  along  the  x-axis.  The  TIC  based  on  point 
predictions,  because  it  ignores  uncertainty  about  item  parameters,  is  misleadingly  high — 
even  higher  than  the  TIC  based  on  baseline  estimates  in  the  region  where  the  predicted 
difficulties  are  centered.  The  TIC  based  on  expected  response  curves  is  appropriately 
lower — about  33-pcrcent  lower  than  the  baseline  TIC  on  the  average.  Figure  13  shows  the 
proportional  increase  in  the  standard  errors  of  the  250  examinees.  Since  information  is 
additive  over  items,  one  would  have  to  administer  58  items  to  obtain  the  same  precision 
about  a  typical  examinee’s  0  when  using  expected  response  curves,  compared  to  using  39 
items  whose  true  parameters  were  known  with  certainty.  This  is  a  more  honest  estimate  of 
the  impact  of  using  items  whose  parameters  are  known  only  through  their  modest 
relationships  with  available  collateral  information,  to  be  weighed  against  the  costs  of 
obtaining  information  from  a  large  calibration  sample  of  examinees. 

[Insert  Figures  9-13  about  here] 

As  mentioned  above,  the  predictive  distributions  built  in  Part  1  can  also  be  used  as 
prior  distributions  to  augment  information  from  examinee  response  data.  This  was  done 
with  a  modified  version  of  BILOG,  using  responses  from  a  new  sample  of  250  Form  8 
examinees.  Multivariate  normal  posterior  distributions  were  are  obtained,  with  Bayes 
modal  estimates  as  means  and  covariance  matrices  for  each  item  that  reflected  the  sum  of 
precision  from  the  collateral-information  based  prior  and  250  examinee  responses.  3PL 
approximations  to  expected  response  curves  were  again  generated.  Figures  14  and  15  are 


Equating  with  Little  or  No  Data 

Page  19 

the  resulting  TCC  and  TIC,  and  Figures  16  and  17  are  the  MLEs  and  standard  errors  for 
the  same  sample  of  250  examinees  used  in  Figures  10  and  13.  The  TCC  and  individual 
MLEs  are  now  quite  accurate,  in  the  sense  of  agreeing  with  estimates  obtained  with  item 
parameter  estimates  from  the  baseline  sample.  Posterior  variances  for  examinees’  6s 
practically  match  those  obtainable  with  baseline  item  parameter  estimates. 

[Insert  Figures  14-17  about  here] 

By  exploiting  collateral  information  about  items  in  a  framework  that  appropriately 
accounts  for  the  remaining  uncertainty,  it  was  possible  in  this  example  to  obtain  consistent 
estimates  of  examinee  abilities  and  honestly  state  the  uncertainty  about  them — with  no 
response  data  at  all  for  the  items  used  to  measure  the  examinees.  Using  the  same  collateral 
data  to  generate  a  prior  distribution  for  item  parameters,  a  supplemental  calibration  sample 
of  250  examinees  provided  estimates  nearly  indistinguishable  from  those  obtained  with  the 
baseline  item  parameters  with  5000  responses  or  more  per  item. 

Conclusion 

The  title  of  this  paper  is  a  bit  of  a  come-on;  the  techniques  we  describe  don’t  really 
equate  tests  without  any  data  at  all.  The  point  is,  though,  that  the  data  they  require  are  not 
the  same  pretesting-  and  equating-sample  examinee  data  upon  which  previous  equating 
procedures  have  traditionally  relied.  Years  of  research  have  shown  that  collateral 
information  about  items  can  be  predictive  of  item  operating  characteristics.  Recent 
developments  in  statistical  methodologies  make  it  possible  to  exploit  this  information  in  the 
equating  problem,  while  giving  an  honest  account  of  the  consequences  of  the  remaining 
uncertainties.  There  is  no  assurance  that  the  collateral  information  about  items  available  in 
any  particular  application  will  be  sufficiently  rich  to  eliminate  or  substantially  reduce 
pretesting  and  equating.  This  remains  to  be  discovered  case  by  case.  We  now  hope  to 
explore  the  potential  of  the  approach  in  a  variety  of  settings. 


References 


Equating  with  Little  or  No  Data 

Page  20 


Angoff,  W  J L  (1984).  Scales,  norms,  and  equivalent  scores.  Princeton:  Educational 
Testing  Service. 

Arnold,  B.C.,  &  Strauss,  D.  (1988).  Pseudolikelihood  estimation.  Technical  Report  No. 
164.  Riverside,  CA:  Department  of  Statistics,  University  of  California. 

Bejar,  I.  L  (1983).  Subject  matter  experts'  assessment  of  item  statistics.  Applied 
Psychological  Measurement,  7,  303-310. 

Bejar,  I.I.  (1985).  Speculations  on  the  future  of  test  design.  In  S.E.  Embretson  (Ed.), 
Test  design:  Developments  in  psychology  and  psychometrics  (pp.  279-294). 
Orlando:  Academic  Press. 

Bock,  R.D.  &  Aitkin,  M.  (1981).  Marginal  maximum  likelihood  estimation  of  item 
parameters:  An  application  of  an  EM  algorithm.  Psychometrika,  46, 443-459. 

Chaffin,  R.,  &  Peirce,  L.  (1988).  A  taxonomy  of  semantic  relations  for  the  classification 
of  GRE  analogy  items.  Research  Report  RR-87-50.  Princeton,  NJ:  Educational 
Testing  Service. 

Chalifour,  C.,  &  Powers,  D.E.  (1989).  The  reationship  of  content  characteristics  of  GRE 
analytical  reasoning  items  to  their  difficulties  and  discriminations.  Journal  of 
Educational  Measurement,  26, 120-132. 

Cron  bach,  L.J.,  Gleser,  G.C.,  Nanda,  H.,  &  Rajaratnam,  N.  (1972).  The  dependability 
of  behavioral  measurements:  Theory  of  generalizability  for  scores  and  profiles. 
New  York:  Wiley. 

Dorans,  N.  (1990).  Equating  methods  and  sampling  designs.  Applied  Measurement  in 
Education,  3,  3-17. 

Drum,  P.A.,  Calfee,  R.G,  &  Cook,  L.K.  (1981).  The  effects  of  surface  structure 
variables  on  performance  in  reading  comprehension  tests.  Reading  Research 
Quarterly,  16, 486-514. 


Equating  with  Little  or  No  Data 

Page  21 

Embrctson,  S£.  (Ed.)  (1985).  Test  design:  Developments  in  psychology  and 
psychometrics.  Orlando:  Academic  Press. 

Enright,  M.K.,  &  Bejar,  I.I.  (1989).  An  analysis  of  test  writers'  expertise:  Modeling 
analogy  item  difficulty.  Research  Report  RRS9-35.  Princeton,  NJ:  Educational 
Testing  Service. 

Fischer,  GJi.  (1973).  The  linear  logistic  test  model  as  an  instrument  of  educational 
research.  Acta  Psychologica,  37,  359-374. 

Fischer,  G.H.,  &  Formann,  A.K.  (1982).  Some  applications  of  the  logistic  latent  trait 
models  with  linear  constraints  on  the  parameters.  Applied  Psychological 
Measurement,  6,  397-416. 

Guttman,  L.  (1959).  A  structural  theory  for  inter-group  beliefs  and  action.  American 
Sociological  Review,  24,  318-328. 

Hambleton,  R.K.  (1989).  Principles  and  selected  applications  of  item  response  theory.  In 
R.L.  Linn  (Ed.),  Educational  measurement  (3rd  ed.)  (pp.  147-200).  New  York: 
American  Council  of  Education/Macmillan. 

Hivcly,  W.,  Patterson,  H.L.,  &  Page,  S.H.  (1968).  A  "universe-defined"  system  of 
arithmetic  achievement  tests.  Journal  of  Educational  Measurement,  5, 275-290. 

Irvine,  S.H.,  Dann,  P.L.,  &  Anderson,  J.D.  (in  press).  Towards  a  theory  of  algorithm- 
determined  cognitive  test  construction.  British  Journal  of  Psychology. 

Lewis,  C.  (1985).  Estimating  individual  abilities  with  imperfectly  known  item  response 
functions.  Paper  presented  at  the  Annual  Meeting  of  the  Psychometric  Society, 
Nashville  TN,  June,  1985. 

Lord,  F.M.  (1980).  Applications  of  item  response  theory  to  practical  testing  problems. 
Hillsdale,  NJ:  Erlbaum. 

Lord,  F.M.  (1982).  Item  response  theory  and  equating — A  technical  summary.  In  P.W. 
Holland  &  D.B.  Rubin  (Eds.),  Test  equating  (pp.  141-148).  New  York:  Academic 
Press. 


Equating  with  Little  or  No  Data 

Page  22 

Lorge,  L,  &  Kruglov,  L.  (1952).  A  suggested  technique  for  the  improvement  of  difficulty 
prediction  of  test  items.  Educational  and  Psychological  Measurement,  12, 554- 
561. 

Lorge,  I.,  &  Kruglov,  L.  (1953).  The  improvement  of  estimates  of  test  difficulty. 
Educational  and  Psychological  Measurement,  13, 34-46. 

Mayer,  R.E.  (1981).  Frequency  norms  and  structural  analysis  of  algebra  story  problems 
into  families,  categories,  and  templates.  Instructional  Science,  10, 135-175. 

Mislevy,  RJ.  (1986).  Bayes  modal  estimation  in  item  response  models.  Psychometrika, 
51,  177-196. 

Mislevy,  R.J.  (1988).  Exploiting  auxiliary  information  about  items  in  the  estimation  of 
Rasch  item  difficulty  parameters.  Applied  Psychological  Measurement,  12,  281- 
296. 

Mislevy,  R.J.,  &  Bock,  RJ).  (1983).  BILOG:  Item  analysis  and  test  scoring  with  binary 
logistic  models  [computer  program].  Mooresvillc,  IN:  Scientific  software,  Inc. 

Mislevy,  RJ.,  &  Sheehan,  K.M.  (1989a).  The  role  of  collateral  information  about 
examinees  in  item  parameter  estimation.  Psychometrika,  54, 661-679. 

Mislevy,  RJ.,  &  Sheehan,  K.M.  (1989b).  Information  matrices  in  latent-variable  models. 
Journal  of  Educational  Statistics,  14,  335-350. 

Mislevy,  RJ.,  &  Yan,  D.  (in  press).  Dealing  with  uncertainty  about  item  parameters: 
Multiple  imputations  and  SIR.  RR-92-xx-ONR.  Princeton:  Educational  Testing 
Service.) 

Mitchell,  KJ.  (1983).  Cognitive  processing  determinants  of  item  difficulty  on  the  verbal 
subtests  of  the  Armed  Services  Vocational  Aptitude  Battery.  Technical  Report  598. 
Alexandria,  VA:  U.S.  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences. 

Osbum,  H.G.  (1968).  Item  sampling  for  aciiievemen:  lasting.  Educational  and 
Psychological  Measurement,  28, 95-104. 


Equating  with  Little  or  No  Data 

Page  23 

Rubin,  D.B.  (1987).  Multiple  imputation  for  nonresponse  in  surveys.  New  York:  Wiley. 

Scheiblechner,  H.  (1972).  Das  lemen  und  losen  komplexer  denkaufgaben.  Zeitschrift fur 
Experimented  und  Angewandte  Psyckologie,  19, 476-506. 

Scheuneman,  J.,  Gerritz,  K.,  &  Embretson,  S.  (1989).  Effects  of  prose  complexity  on 
achievement  test  item  difficulty.  Paper  presented  at  the  annual  meeting  of  the 
American  Educational  Research  Association,  San  Francisco,  CA,  March  1989. 

Tatsuoka,  K.K.  (1987).  Validation  of  cognitive  sensitivity  for  item  response  curves. 
Journal  of  Educational  Measurement ,  24, 233-245. 

Thorndike,  R.L.  (1982).  Item  and  score  conversion  by  pooled  judgment.  InP.W. 

Holland  &  D.B.  Rubin  (Eds.),  Test  equating  (pp.  309-326).  New  York:  Academic 
Press. 

Tinkelman,  S.  (1947).  Difficulty  prediction  of  test  items.  Teachers  College  Contributions 
to  Education,  No.  941.  New  York:  Teachers  College,  Columbia  university. 

Tsutakawa,  R.K.,  &  Johnson,  J.  (1990).  The  effect  of  uncertainty  of  item  parameter 
estimation  on  ability  estimates.  Psychometrika,  55,  371-390. 

Tsutakawa,  R.K.,  &  Lin,  H.Y.  (1986).  Bayesian  estimation  of  item  response  curves. 
Psychometrika,  51, 251-267. 

Tsutakawa,R.K.,  &  Soltys,  M.J.  (1988).  Approximation  for  Bayesian  ability  estimation. 
Journal  of  Educational  Statistics,  13, 117-130. 

Whitely,  S.E.  (1976).  Solving  verbal  analogies:  Some  cognitive  components  of 
intelligence  test  items.  Journal  of  Educational  Psychology,  68, 234-242. 

Wingersky,  M.S.  (1983).  LOGIST:  A  program  for  computing  maximum  likelihood 

procedures  for  logistic  test  models.  In  R.K.  Hambleton  (Ed.),  Applications  of  item 
response  theory.  Vancouver,  B.C.:  Educational  Research  Institute  of  British 
Columbia. 


Table  1 

Descriptive  Statistics  and  Parameter  Estimates  from  Multivariate  Regression  Model 


Correlation  with  ^ 

Item  Difficulty  Parameters  in  Regression  Model 

rrf  _ _ ^ ^ 


Variable 

Rater  1 

Rater  2 

E?5aMritil 

Slope 

Intercept 

Lower 

Asymptote 

The  Item  Passage 

3  Syllable  Words 
per  100  Words 
Sentences  per  100 

.14 

.20 

.91 

-.02321 

Words 

.01 

.01 

.93 

.11101 

The  Item  Stem 

Closed? 

.11 

.10 

.99 

-.19720 

Hidden  Negative? 

.00 

.00 

.99 

-.16061 

Line  References? 

.11 

.11 

.96 

-.48298 

The  Options 

#  Arguments 

.18 

.26 

.93 

-.07365 

-.00190 

Aspects  of  Targetted 

Solution  Strategy 

Translate  Active  & 

Passive 

-.16 

-.05 

.90 

.19295 

.36407 

Translate  Positive 
&  Negative 
Process  Single 

.04 

.15 

.95 

-.74103 

Sentence 

-.08 

-.18 

.83 

.12783 

#  Steps 

.30 

.20 

.70 

-.11304 

Residual  Covariance  Matrix 

Slope  ,05156 

Intercept  .01821  .49404 

Lower  Asymptote _ -.00130  -.00161  .00121 


List  of  Figures 


1 .  Point  Predictions  of  Item  Parameters  versus  Baseline  Estimates 

2 .  Test  Characteristic  Curves  from  Point  Predictions  of  Item  Parameters  and  Baseline 
Estimates. 

3 .  IRT  True-Score  Equating  Curves  based  on  Point  Predictions  of  Item  Parameters  and 
Baseline  Estimates 

4 .  Examinee  MLEs  based  on  Point  Predictions  of  Item  Parameters  and  Baseline  Estimates 

5 .  Comparison  of  Examinee  Standard  Errors  Calculated  with  Point  Predictions  of  Item 
Parameters  and  Baseline  Estimates  in  Place  of  True  Item  Parameters 

6.  The  Effect  of  Uncertainty  about  b  on  Estimated  Probabilities  of  Correct  Response 

7 .  Distribution  for  the  Probability  of  a  Correct  Response  at  0=1  Induced  by  the  Uncertainty 
about  b 

8 .  Item  Trace  Lines  Calculated  with  Baseline  Estimates  and  Point  Predictions  of  Item 
Parameters,  and  Parametric  and  Nonparametric  Expected  Response  Curves 

9 .  Test  Characteristic  Curves  from  Expected  Response  Curves  and  Baseline  Estimates  of  Item 
Parameters 

1 0.  Examinee  MLEs  based  on  Expected  Response  Curves  and  Baseline  Estimates  of  Item 
Parameters 

1 1 .  IRT  True-Score  Equating  Curves  based  on  Expected  Response  Curves  and  Baseline 
Estimates  of  Item  Parameters 

12.  Test  Information  Curves  based  on  Expected  Response  Curves,  and  Point  Predictions  and 
Baseline  Estimates  of  Item  Parameters 

1 3.  Comparison  of  Examinee  Standard  Errors  Calculated  with  Expected  Response  Curves  and 
with  Baseline  Estimates  of  True  Item  Parameters 

14.  Test  Characteristic  Curves  from  Baseline  Estimates  of  Item  Parameters  and  Expected 
Response  Curves  based  on  Collateral  Information  and  250  Examinees 


1 5 .  Test  Information  Curves  based  on  Baseline  Estimates  of  Item  Parameters  and  Expected 
Response  Curves  from  Collateral  Information  and  250  Examinees 

1 6.  Examinee  MLEs  based  on  Baseline  Estimates  of  Item  Parameters  and  Expected  Response 
Curves  from  Collateral  Information  and  250  Examinees 

1 7 .  Comparison  of  Examinee  Standard  Errors  Calculated  with  Baseline  Estimates  of  Item 
Parameters  and  Expected  Response  Curves  from  Collateral  Information  and  250  Examinees 


PREDICTED  bPD_m_n  “PREDICTED 


1.5 


FIGURE  1 


Point  Predictions  of  Item  Parameters  versus  Baseline  Estimates 


TEST  CHARACTERISTIC  CURVE 


1.0 


o  o  4- 

-3 


T 


-2 


— i - r— 

-1  0 

ABILITY 


2  3 


BASELINE 


PREDICTED 


FIGURE  2 

Test  Characteristic  Curves  from  Point  Predictions  of 
Item  Parameters  and  Baseline  Estimate 


BASELINE  -  PREDICTED 


FIGURE  3 

IRT  Truc-Score  Equating  Curves  based  on  Point  Predictions  of 
Item  Parameters  and  Baseline  Estimates 


°BASEUNT 


FIGURE  4 

Examinee  MI.Es  based  on  Point  Predictions  of 
Item  Parameters  and  Baseline  Estimates 


-2 


-1 


1 


2 


3 


0 

eBA3IUNI 

FIGURE  5 


Comparison  of  Examinee  Standard  Errors  Calculated  with  Point  Predictions  of 
Item  Parameters  and  Baseline  Estimates  in  Place  of  True  Item  Parameters 


theta 


-  •  m  -  "best  estimate"  of  response  curve 
—o—o—  posterior  expection  of  response  probability 

>  |  response  curve  based  on  estimated  b  i  1  SE 

fTTTTTTTTTTl  response  curve  based  on  estimated  b  1  2  SE 


The  Effect  of  Uncertainty  about  b  on  Estimated  Probabilities  of  Correct  Response 

Figure  6 


e=  1.0 


Distribution  for  the  Probability  of  a  Correa  Response  at  0=1 
Induced  by  Uncertainty  about  b 


Figure  7 


ability  ability 

BASELINE  -  PREDICTED  ♦  NONPARAMETRIC  -  EXPECTED 


FIGURE  8 

Item  Trace  Lines  Calculated  with  Baseline  Estimates  and  Point  Predictions  of  Item 
Parameters,  and  Parametric  and  Nonparametric  Expected  Response  Curves 


2-10  1  2 

ABILITY 

BASELINE  -  EXPECTED 

FIGURE  9 

Test  Characteristic  Curves  from  Expected  Response  Curves 
and  Baseline  Estimates  of  Item  Parameters 


FORM  8  -  PROPORTION  CORRECT  SCORE 


BASELINE  -  EXPECTED 


FIGURE  11 

IRT  True-Score  Equa  ,  Curves  based  on  Expected  Response  Curves 
and  Baseline  Estimates  of  Item  Parameters 


ABILITY 

BASELINE  -  PREDICTED  -  EXPECTED 

FIGURE  12 

Test  Information  Curves  based  on  Expected  Response  Curves, 
and  Point  Predictions  and  Baseline  Estimate'  of  Item  Parameters 


FIGURE  13 


Comparison  of  Examinee  Standard  Errors  Calculated  with  Expected  Response  Curves 
and  with  Baseline  Estimates  of  True  Item  Parameters 


ABILITY 


-  BASELINE  - COLLATERAL  INFO.  PLUS  250  S-S 

FIGURE  14 

Test  Characteristic  Curves  from  Baseline  Estimates  of  Item  Parameters  and  Expected 
Response  Curves  based  on  Collateral  Information  and  250  Examinees 


TEST  INFORMATION  CURVE 


ABILITY 


-  BASELINE  .  COLLATERAL  INFO.  PLUS  250  S'S 

FIGURE  15 

Test  Information  Curves  based  on  Baseline  Estimates  of  Item  Parameters  and  Expected 
Response  Curves  from  Collateral  Information  and  250  Examinees 


BASE  UNI 


FIGURE  16 

Examinee  MLEs  based  on  Baseline  Estimates  of  Item  Parameters  and  Expected 
Response  Curves  from  Collateral  Information  and  250  Examinees 


RELATIVE  INCREASE  IN  STANDARD  ERROR 


FIGURE  17 


Comparison  of  Examinee  Standard  Errors  Calculated  with  Baseline  Estimates  of  Item 
Parameters  and  Expected  Response  Curves  from  Collateral  Information  and  250  Examinees 


rrocr  ro.  r  it#  n 

FROM  ALL.ARIA,  MJUUMVT 


Dr.  Tarry  —an 
Uauoal  huamp 
2tCC  Eiunuon  BWy 
Unr— »ry  at  ttm 
Oiianyv  IL  *1101 


Dr.  Tarry  Atad 
Cat*  114X5 
Offiaa  af  Naval  i  a  — 
UK  Or«y  St 
Artnpao.  VA  12211-1000 


Dr.  Nancy  Alan 
Eduaauenal  Taaary  Sartia » 
hauKIOIW 

Dr.  Groyery  Aari| 

Edueeuenal  Taaury  Santa 
fni— no  KJ  00141 

Dr  Ph«ye  Anta 
Gradual*  Sennet  of  Maneyo— a* 
Ruiym  Uraranay 
K  ham  Stfott 
NaaatKKl  «n«,HW 

Dr.  laaac  l  Sayan 
Lr»  Sedooi  Ada— on 
Srrvroaa 
Ben  40 

Nnl—n.  PA  10*40  0040 

Dr.  wuitae  O  Barry 
Dncur  ot  Ldt  and 
EnviranaraOtal  Iranra 
AFOSS.*L  Ml  BU(  410 
Botin*  AFB.  DC  MJJLa-Mi 


Dr  Th—  G  Br— r 
Drpartnaai  et  PryTtMlopr 
Un~m.ry  at  Rg*aur 
Rr—  Station 
Rodaa.tr  NY  U427 

Dr  Mar.uaha  BmnOo— 
Educational  Tnuny 
Sanaa 

Pnn—  NJ  00141 

Dr  Bruot  B— 

Drftnar  Manpower  Dau  C agar 
09  Panfa  V 
Sum  jJ5a 

Mommy.  CA  93*0-2231 

Dr  Gavoatr  Boedos 
Edwruonai  Taiun|  Saraiaa 
Pnoeatora  NJ  00141 

Dr  RaTiard  L  Brvrrd 
HO  USMErOOM.’MEPCT 
uoo  Gram  Bay  Road 
Norm  Cfeaayra  IL  40044 

Or.  RoOtn  Inaaaa 
Amman  GoUtya  Taauny 
Prapaax 
P.  O  Boa  140 
kna  Cay.  U  ISO 

Dr.  David  V  Budnni 
Davanaiani  at  Payrdnlnp 
U«— any  at  Karla 
Mourn  Carnal.  Haifa  SW* 
BRaEL 

Dr  Grajory  CatrdaO 
CTO  VacM.uan/MrGrnnHB 
2300  Cardan  Road 
Mommy.  CA  0*40 

Or  Paul  R  CtaaOar 

Pcraryvanua 

1911  North  Ft  Myar  Dr. 

Su'ta  *00 

AAnpan.  VA  8300 


~  - - , 

U  Non*  Ouioer  Si 
MayAVAlfflVWt 


O'  Rrraand  1  tVud 

UES  LAMP  laaa  Ad— or 

AUHKMBL 

Broody  APB.  TX  BBS 


Or.  Nana*  CK 
Dayan—  at  PyUap 
Una.  at  So  Ctaor— 

La  A—  CA  000*21*1 

Bnaa 

Lda  Sanaa a  Cada  11C 
Cdtroa  at  Karat  »■— a 
AKapaa.  VA  S21VM00 


Naral 


Coat  ton 

Waadeipea  DC  W7S-MOO 


Dr.  John  M.  CunaaO 
Dryanaaoi  «4  Pi)idadm 
VO  Piyrritjinpr  Pitapw 
Tuhna  Uuiraraty 
Nau  Ortaaaa.  LA  20U» 


Dr,  WiBaa  Craao 
Dapartaam  at  NfOdoff 
Tan  ARM  Uakarny 
Coutyc  S— TX  71*0 

Dr.  Lai*  Came 

Dalaraa  llnyaae  Da  Cana 

Son  400 

M00  Witaoa  Bad 

Bwlyet  VA  8209 

t>.  Taaoalry  Danr 
Araanan  CoGtyt  Taawy  Pnpw 
P.O.  Baa  140 
hne  Cay.  LA  SS41 

Dr.  Oiana  t  Dnia 
Educawenal  Taawy  Sanaa 
Mai  Stay  S-T 
fnrnaiaa.  KJ  0*41 


Dr  ReR*  I.  DaAyan 
Manraaen.  taaa 
and  Emkawo 
Banyaan  BW*.  Ra.  UMF 
Unr— y  at  Maryand 
Catatyt  Pnx  l*D  WW 


Dr.  Darn  Derry 
Plana  Statt  Oar— ady 


TaOakaraaa.  PL  SQM 


HnSDony 

4  Carrara a  PI 
RM  PYA-TOS 
P.D  Baa  1» 
Paaaray.K10001S.mt 

Dr.  FtafDama 
Edurmana t  Taaap  Baa— 
Pi— an .  Hi  00141 


Dr.  Pna  0— far 
Ua— ary  at  Ban 

4CS  IDwdh  ’  1  ' 
Quay ■p.B.dWO 


fnlornaaaa  Can— 
Ctara  In—  fekt 


Dr.  Rta ard  Du— i 
Gradueet  St—  af  Ida— i 
Oa—y  ad  CaKame 
Sana  Bar—  CA  0100 


Dr  Sum  I 
U— y  at  I 


42* 


.BOM 


Dr  Caoryr  bp— i  Jr. 

Dr  arne  at  fdjawaal  Santa 
Eaaory  Uonanay 
801—8% 
AtaaDAMS 

PUC  Pec— y  nirii— 

Jaao  Raaaarcd  Baa.  Suaa  SJ0 
Read— a.  UD  MRS3B 


Dr.  Mardad  1  Fur 

ftrr-Sqfit  Ca 

820  Non*  Van  taw 

Artnp—v  VA  SaW 


Dr.  Leonard  Pa— 
Lardquac  Caaar 

Unr— aay  at  laaa 
lorn  Cay.  lAma 


Dr.  Rirdard  L  Pan— 
Aatnaa a  CaOryt  T— raf 
P  O  Boa  41 
ton  Crry.  IA  JSO 


Dr  Gntad  Fa— ar 
L—nnarr  1 
A  1010  Viaan 
AUSTRIA 

Dr.  Myron  Fota 
US.  Arary  Haadyuntara 
DAPE.HR 
Tka  Pcnapn 

Waatraryua  DC  2001*4000 

Mr.  Paid  Fotry 

Nny  Pa—  RRD  Can— 

Sao  Drapr.  CA  R1U4U 

Chair.  Dayar—  af 
Ce—anar  S aaaat 
Gaoryr  Mnen  Unr— y 
FaateVA  SOM 

Dr.  Radar  D  ClOnra 
Urrmmny  af  Bmas  m  Cdtayo 
NPI  *0*A.  MJC  OO 
*U  Sow*  Weed  Suta 
Cb— yo.lL  ddOS 

Dr.  Jrniar  OWard 
Unr— aay  at  Maaaacdraas 
Sctrord  af  Eduoa—e 
A— .  MA0O0 

Da.  Mat  Ohaar 
Laamary  Ranerad 
A  Oaraio— »  Car— 
Ual— any  af  Prnadrata 
202*  O'Hara  Sow 
Pauaurjfc.  PA  UJOO 

Or.  Saw  R.  Gctdaaa 
Pnhedy  Cadaya.  Boa  « 
VandaW  Unnanrrj 
NaUntaTN  27203 

Pr.T— t—yOatadU 
Drpar—ani  of  Pr— day 
UrM— arty  af  Naa  Mean 
Aft— yue,  NM  07131 


/Jana—.  VA  8214 


avzrm 


Dr.  Jdaept  Md  atMac 
Ne«y  Pmoml  Rmamch 
and  DmKfW  Gam 
Cate  14 

fee  Dmpx  CA  I21S2-4400 


Ain  Mead 

m>  Or  Mdatl  Law 
Edumueaf  Nfmip 
214  Uim  Bad* 
Ueavarny  W  IWm 
Clllnn,  it  41H1 

Dr.  TaaothyMRm 
ACT 

l.aiaM 

bwar.umo 

Dr.  Root  Matrvy 

Pn!aaL*NJ fD41  *""" 


Dr.  M  MoWnw 
Feadai  Soar Ir  “  nnw 
R  awwmnai  Crmpn 
Grow  Krma i reel  2/1 

vn:  rs  Cmn 

Tlx  NETHERLANDS 

Dr.  E  Murakj 
Educational  Taaun|  Sana 

D - -  -  -  P - I 

hwaua  NJ  0B41 

Dr  Rmim  Kendal  umar 
Educaimnai  Stwda 
W  Liana  Hall  Rom  2UE 
U'xvrrxrr  at  Drnrara 
Mart.  DE  in* 


Dr.  Nur  1  ran? 

EO  mecW  Two*  Sanaa 
r-rirli'T  Road 
Pnnaut  NJ  4B41 

Wayne  M.  Patios 

GED  Twurj  lam  fcau  X 
One  Dupom  OrWa.  NW 
Waaharf  nn.  DC  XBH 


Nam I  Foaiyuf  It  Stool 
Mooary.  CA  I5MS40M 


Dr  Pet*  hoS 
StbeW  at  Eduabon 
Unrvarmry  af  Cettomm 

BwtWajr,  CA  M7» 

Dr.  Mart  a  tacMac 

ACT 

r.  o.  an  m 

Mae  dry.  LA  52243 


Mr  Swot  Ran 
Deponmma  af  Pwrtudng 
Urvoaay  at  Catfann 
Rorxdo.  CA  42321 


Mr.  Lea 
Unverofy  af  I 
Don  naan  ofl 
101  I  Bat  Hal 
722  SouW  Wngfea  St 
Champs?!,  IL  4100 


Dr  DonaM  RxWo 


Atadepw  Propi  A  Roaaar®  Brand 
Nam1  Taohnoi  Trauuni  Command 
Con  NA2 
NAS  Moapta  (75) 

MdidqaarvTN  MM 


Dr  W.  Alan  MomrWr 
Ufwrrvrv  at  Ocmhama 
Data  norm  at  PeyWxao g 
Norman.  OK  7S071 


Si  tioroa  Dapanmam 
Sacra  Cana.  I  goat  *M 
1  (Mara  Sum 
Harvard  Umvanny 
Cemfcndft  MA  B212I 


Or.  Fumika  Samapam 
Dcparuacni  at  Pryeboiog 
Umwrviiy  at  Twvwaiaa 
210B  Amin  Paay  BU» 

.  TN  1 


Head  PcnomW  Syiiame  Di|irmna  Dr.  Maty  Una 

NPRDC  (Code  IT)  4100  PartaWc 

San  Dveyo.  CA  02152  ARM  CarMad.  CA  4200* 


te  Jody  Spny 

ACT 

FJO.  Bn  Ml 

Mot  Coy.  1A  S2243 

Dr.  Manta  Stats* 
Eductuonal  Tmcatg  Swviac 
Prtimurt  KJ  IBS41 


Or.  V* 
Uawmiyaf 


101  Uni  HM 
723  Soufe  WipfeaSt 

— i  r  *i 


EAxarircoJ  Taaunf  Swraaa 

MMStepO-T 
Pnoaono,  HI  4IS41 

Dr.  David  TMaan 
Faydamanit  Latrronrj 
CM  227R  Dana  Hal 
Uarmnoy  at  Nona  Carahna 
CbapW  HR  NC  275*03210 

Mr  Tbomaa  2.  Thoms 
Federal  Ejprme  Corpocwma 
Human  Reeses  r<  Onpaiara 
SCO)  Du  an  nr  Ursa.  Sana  SOI 
MoptmlX  JODI 

Uravaraoy  af  Ittnoa 
Educating)  Pvyitxaop 
ChnmpagL  IL  41X20 

Dr  Hoard  Was 
Ednrnimnal  Tmuaj  lag 
hnao  N)  0RS41 

faraVary.  Wald 

OCm  at  NavW  TaWnalag 

Code  277 

ICO  North  Ouncy  Struct 
Artnpen.  VA  22211)000 

Dr.  MWad  T.  Wats 
Umvwy  at 
Wooonon  M  Loutaa 
Eduauonal  PiyetirUrg  Daft 
Bea  4D 

Menas.  W1  SON 


Tmn.vj  Snow  Deposes 

NPRDC  (Cote  14) 

San  Data  CA  42U2ABOO 

Lr»r»ry.  NPRDC 
Code  Ml 

San  DofO.  CA  I22J2AMO 


Uararian 

Nan!  Carper  (or  Applied  Raomrah 
n  Amfoul  lmrlv|enet 
Nni  Rcaearch  Laaerwory 
Code  ))10 

Waabeiium  DC  2D7S-S400 

Off—  at  Nani  Bagrat. 

Cade  1142CS 
400  N  Otaney  Sara 
Arvtfon.  VA  222114000 
(*  Co fa) 


Mr  Rohm  Sammoi  - 
N214  Flow  Hal 
Depanona  at  Paynhnlng 
Uwamij  at  Mmmnu 
MetoeapoM.  MN  U43S-044 

Dr  VatSh  L  Shaia 
Deparuam  at  kahoadd 
EnfnawiOf 

Sou  Dnrvenoy  of  Nee  Tort 
SC  Lamar  D  Bel  Hal 
BuflaH.  NY  1040 


Mr.  RaWard  1  Shoetfoa 
Gmduaic  SehoW  af  Edundao 
Unrrstey  at  CeUaram 
Sana  Bartav  CA  40 IM 


Dr.  M«ny-Mai  Wing 
Edusuonal  Tsuig  Sana 
Mai  Stop  0-7 
Piatrsiai  N3  44S41 

Dr.  Tbomaa  A.  Warm 
FAA  Aaodamy 
P.  0.  Boa  2S4B2 
OUahome  dry.  OK  BUS 

Dr  David  l  Warn 
NM0  Eton  Hal 
UnMnoy  of  Mmnomm 
75  E  Root  Road 
MameapoM.  MN  JS4SS4S44 

Dr  Doom  Wmari 
Coda  IS 

Navy  Parpen*  RRD  Cana 
San  DofO  CA  421S2-4B00 


Spanal  Amman  for  Rmmrth 

Chief  at  Naval  Pnenof  (FERSOlTT) 
Drpartmmd  af  iht  Navy 
Wmhoif  wv  DC  lOJSAMB 

Dr  Judah  Oramam 
Mai  Stop  2501 
NASA  Arem  Rearm  Cowr 
MoRmtFwRCA  4405) 


Dr.  Randall 
Naval  Ro 
Code  I  m 
«])  Ooloot  Awm  RW. 
Wmhinf  nrt.  DC  20754004 


CoalnmStr.  BQ 
D5000  ReWa  40 
VEST  GERMANY 


mmm 


Dr  Sbama  Con 
AFHRUMOMJ 
IM<  AFB.  TX  -rtas-mi 

Dr.  Bnt  Grant 
John  Haft**  Ur—) 
Dapantnant  of  Paychn*n0 
Ghana  A  M  Stem 
Bataaara.  MD  21216 

Pnf  E »nrf  Haarui 
School  of  Edueauen 
Seen/ord  Umvamty 
StardorACA  *G06-*I*t 

Dr  Roma)  K.  Has  Mur 
Unomrfy  of  Maaiarfaaaa 
Laboratory  of  Paymomatna 
and  Evalmin*  ft-nrrh 
HJh  South.  Rooaa  152 
As  ham  MA  01005 

Dr.  Orhryn  Horn* act 
Univervry  of  Uancai 
51  Carry  Dim 
Chaapai|A  IL  61220 

Dr.  Patnct  R.  Harmon 
Computer  Soane*  Departure 
Ui  Nava!  Academy 
Anrtapetn.  MD  21*02  5000 

Ml  Raoaea  H«irr 
Nan  Pcraonne*  RAD  Carter 
Code  1) 

San  Diaga.  CA  *2152-6000 

Dr  Tbomaa  M.  Haach 
ACT 

P  0  Boa  1*0 
loom  Cry.  1A  522*3 

Dr  Paul  W.  Holland 
Educational  Tmaig  Samoa.  21-T 
Roaadala  Road 
Prmearorv  NJ  005*1 

ProC  LlU  P.  Horalt 
Imuiul  fur  Payehotopc 
RMTH  Aachen 
Jaryarruaaaa  17/1* 

D5100  Aactw 
WEST  GERMANY 

Ml  Julia  S  Hough 
Caobndf*  Unnaraiiy  Pram 
*0  Vrm<  20th  Straaa 
-  Nr»  Tort.  NY  10011 

Dr  William  HonO 
Quaf  Soman 
AFHRLCA 

Brook*  AFB.  TX  7*35-5*01 

Dr.  Huynh  Huynh 
CoOtgc  of  Education 
Unn  of  South  Comma 
CM  umbra  SC  2*20* 

Dr.  Martin  J  Ippai 
Caniar  for  the  Studv  of 
Education  and  Inatruoioo 
Lndan  Unneruay 
P  O  Ren  *555 
2)00  R3  La«d an 
THE  NETHERLANDS 

Dr.  Rakan  Jennorane 
Ehc  and  Compiler  Eng  Dapt. 
U«*~a-L>y  of  South  Coaho 
Caiman.  SC  2*200 


Dr.  Kimv  Jaafdor 
Utavarary  af  OOnoo 
EHrianmani  af  Imam 
Ml  Baa  Hal 
725  Setah  Wn0*  taro 
CHmOptLIin 


Cocnmmrdam  (O-PWT) 
US  Com  Guard 
2100  Saaoed  Su  SW 
"  -  r~  - - - 


Pratmor  Oou^aa  H  Jans 
Graduau  Smoot  of  Mamgama 
Rutgarv  The  Staid  Itaramoy 
of  No*  Jaraay 
NmM.  KJ  *7)02 


210  Edueauoo  Btdg. 
1310  South  Saak  So 
Untiarmy  of  fi.  at 


Pauaurgh.  PA  1521) 


0.  dlBOdWO 


t  Tmung  Sarta 
NJ  0*5*1-0001 


DavaJtmm* 


Pauaurgh.  PA  15221 

Dr  .  1  L  Ram 
Coda  *051 

Naaat  Oman  Imam  Caniar 

San  Dtrga  CA  *2152-5000 

Dr  Mehta  Koptai 
CKTn  of  Baa*  R  march 
US.  Array  Rmmrrh  hmmar 
5001  Eoonhoaor  «aaa 
Atamndna.  VA  223315000 


105  AdMhotd  Ha) 
Uomarury  of  Gaorgd 
AibomCA  MOOS 

Ml  Haa-Rjm  J0n 
Untvmwy  of  I  dm  on 
Departure*  of  Slauaum 
101  (Ban  Hal 
725  South  Wh*i  Sl 
CtaMpatai  IL  610 


Dr.  JwaAaun  Km 
Departure  of  Payahotogr 
Middia  Tannmia  imp 


Dr.  Suot-Hdoo  Km 
XEDf 

*24  Unjaon  Ooeg 


SOUTH  KOREA 

Dr.  0.  Goga  Kinphmy 
Portland  Ptthl*  Schooh 
Raaaorrh  and  tvMuieti 
501  North  Dun  Strm 
P  O  Boa  2107 
PontanR  OR  f720hlM7 


Boa  72*4  Mam  and  Baal  Or. 
Uuiiardty  ofTmaAiWi 
hath  TX  WO 


Mr.  Hatn  hung  U 
Unmnary  of  Utaoh 
Deparuau  <d  tiMta 
101  IBaa  Hal 
725  South  Wright  Sl 
Cbaanamt.  IL  BIDS 


LArary 

Naval  Trataaig  Synua  Cat* 
12)50  R  march  Partway 
Orlando.  FL  22*26-122* 

Dr  .  Mama  C  Lot 
Graduate  Smoot 
of  Education  OUT 
Taman  Hal 
Uonmmry  af  Cahforoia 
Bartatay,  CA  **720 

Dr.  Rohan  L  Lion 
Campua  Boat  Jo* 

Untvmty  of  Colorado 
Boutdar.  CO  K30M24* 

Logmen  |k  (Abb:  LArary) 
Tacual  and  Tntnaig  Syaaam 

P.OBoatSllt 

San  Dwyo.  CA  *21)05151 

Dr  Retard  Laaahl 

ACT 

P  CX  Be*  5*2 
lorn  Cay.  1A  5224) 

Dr.  Ooarft  S  Macraady 
Departure  af  Maaaurauot 
Staiauca  *  Eaaluauon 
CoOrgr  af  Edtauon 
Uornnuy  of  Morytaad 
Cohcyt  Part.  MD  2*742 

Dr.  Eaana  Mandaa 
Oooryc  Maaoo  liana raay 
4*00  Uwaraty  Drtv* 

Fairfax,  VA  22D0 

Dr.  Paul  Hajtai) 

Caniar  tar  Nam  AeaUyan 
*Crl  Port  Am ua 
PA  Boa  M2M 
Alamndna.  VA  22)824961 

Dr  Jamaa  R  McBride 

HtmRXO 

400  Etahtnt  Drtv* 

Ban  Dnyot  CA  «2UB 


Runrrl  Lata 
Uramraty  af  Mr* 

Urtam  0.61201 


Dr.  Path*  KyBoam 
AFHRLMOEL 
Bret*)  APB.  TXWZB 

14a  Carolyn  Lana* 

1515  Spmrartli  Bad 
Spcaaamta  MD  RM 


Mr  .  Ovtaophar  MeCmtar 
Itahmauy  af  Mate* 
Dapanane  at  Paj  tailing 
603  E.  Danta  Sl 
OtampaqpL  L  61220 

Dr.  Rohan  McKMay 
Eduatiortal  Taaung  Santa 
Prmemoo.  NJ  02541 


•mm 


Dr  l 

kma  tt  Uana 
and  ScobI  Pciey 
Kant——  Uw— nr 
Enanan.  IL  U» 


Dr.lna»aM 
Drpanawni  of  Fit— I 
ErratexoB' 

Vrmmm/  ot  ttnan 
Ur*m.  IL  «1»1 

Dr.  Mart  Winn 
Sdwcr  a f  Eduaauen 
llmrary  of  CaHeraaa 
■art  mo,.  CA  M720 

Dr  Eugm  WinopM 
D— rtnam  of  Fayirtolop 
Emory  U  an  army 
Atlanta  GA  XB3 

Dr  Mania  F  Wiataff 
PERSEREC 
f9  Faaific  Su  Suae  4JM 
Monuray.  CA  »3*d0 

Mr.  John  R  Wolfe 

Nary  Peraonnai  RAD  Cana 

San  Diego  CA  RU14M 

Dr  Kaniam  Yamanaao 
OJ-OT 

Educational  Taring  tnviar 
Roaeaak  Road 
P  meal  on.  NJ  0AS4I 

M*.  Duinli  Van 
Edueruonal  Tatung  Service 
Pmonon.  K1  0RS41 

Dr  Wmdy  Yan 
CTBMrOm  Ha 
Dal  Monic  Raacarati  Fart 
Monieray.  CA  F>MO 

Dr  Joerpt  L  Young 
Kauonjt  Socnoa  Foundation 
Room  330 
l®0  G  Sum.  NW. 
WeeOmgion.  DC  30350 


